Skip to content

Comments

perf fix Time#142

Merged
danovaro merged 8 commits intodevelopfrom
fix/perf-regex-time
Aug 20, 2025
Merged

perf fix Time#142
danovaro merged 8 commits intodevelopfrom
fix/perf-regex-time

Conversation

@mcakircali
Copy link
Contributor

try to avoid std::regex

@mcakircali mcakircali requested review from Ozaq and danovaro October 11, 2024 12:23
@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

Codecov Report

❌ Patch coverage is 86.40000% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.74%. Comparing base (99c3ed6) to head (a8900f1).

Files with missing lines Patch % Lines
src/eckit/types/Time.cc 86.40% 17 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #142   +/-   ##
========================================
  Coverage    65.73%   65.74%           
========================================
  Files         1131     1131           
  Lines        57446    57508   +62     
  Branches      4327     4332    +5     
========================================
+ Hits         37760    37806   +46     
- Misses       19686    19702   +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@Ozaq Ozaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at this class, I would also encourage you to add some class level doc and describe the accepted format variants and to be expected exceptions

Comment on lines 148 to 150
throw BadTime("Unkown format for time: " + s);
} else {
throw SeriousBug("Unhandled time format!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for two different exceptions? I understand the intent of those exception types as follows:

  • BadTime is best described as a usage error, as in it was called as Time("ABCD")
  • SeriousBug represents an implementation error such as Time("20:11:24") not getting parsed properly.

Is that second throw giving any additional benefits?


// DIGITS: "^-?[0-9]+$"
// FLOAT: "^-?[0-9]*\\.[0-9]+$"
enum class TimeFormat { UNKOWN, OTHER, DIGITS, DECIMAL };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNKNOWN is treated as erroneous input, it go stronger in the name and call it INVALID

ss = sec;
} else if (format == TimeFormat::OTHER) {
std::smatch m;
if (std::regex_match(s, m, hhmmss_)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using a regex.

ss += 60 * (mm + 60 * (hh + 24 * dd));
if (s[0] == '-') {
ss = -ss;
} else if (std::regex_match(s, m, ddhhmmss_)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using a regex.

Comment on lines 40 to 49
for (auto i = start; i < time.length(); i++) {
if (time[i] == '.') {
if (hasDecimal || i == time.length() - 1) { return TimeFormat::UNKOWN; }
hasDecimal = true;
} else if (isdigit(time[i]) == 0) {
return TimeFormat::OTHER;
} else {
hasDigit = true;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done with a single pass over the input string and then return the typeid of the found variant (pretty similar to what you are doing already) plus up to 5 tokens, 1 token for the sign and 4 for positive integers (day, hour, minute, second).

Subsequent code can then validate on this result, e.g. was the extended flag passed.

I am thinking along the lines of:

struct tokenized_time {
    FormatType type;
    std::string_view sign;
    std::string_view integers[4];
};

@simondsmart simondsmart added the approved-for-ci Approved for CI run label Feb 19, 2025
@danovaro danovaro merged commit eb8a4a1 into develop Aug 20, 2025
222 of 228 checks passed
@danovaro danovaro deleted the fix/perf-regex-time branch August 20, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved-for-ci Approved for CI run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants